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Introduction and summary 


A central part of education reform today is the wide-ranging and unprecedented 
effort to either revamp existing teacher evaluation systems or develop and imple- 
ment entirely new systems. In the past three years, for example, 32 states and the 
District of Columbia have made some change to their state teacher evaluation 
policy, and 23 states currently require that teacher evaluations include objective 
evidence of student learning, up from only four states in 2009. 1 The success of this 
work will in large part be judged by the extent to which the resulting systems can 
evaluate teachers with rigor, objectivity, and in ways that differentiate teachers’ 
abilities to promote student learning. 

Meeting this high bar in our nations high schools poses especially difficult chal- 
lenges, and yet the stakes for doing so are enormous, a point brought home by the 
extant research. One particular strand of research focuses attention to the impor- 
tance of identifying and addressing teacher effectiveness within schools, where the 
bulk of the variation in teacher effectiveness resides. 2 At the same time, research 
indicates a clear and urgent need to accomplish this task in our nations high schools. 

The argument for focusing attention at the high school level is three-pronged. 

First, the performance of high school students lags behind that of demographically 
similar students in the elementary and middle grades, which suggests that, relative 
to the earlier years, there is a heightened need for improving the quality of instruc- 
tion in high school. 3 Second, dropout decisions are made by students in their high 
school years, which means improving average teacher quality in high school is 
one potential avenue for addressing the stubbornly persistent dropout rate. The 
research-based linkage is that student engagement is related to dropping out and 
teachers’ behaviors and practices are, in turn, related to student engagement. 4 

Third, high school is our last line of defense for preparing students for college and 
the world of work — and teachers are an obviously critical component of the qual- 
ity of that preparation. Students entering college lacking a solid high school educa- 
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tion often have to spend time in remedial college courses, a sidetrack associated 
with fewer earned credits and a lower likelihood of graduating with a degreed In 
terms of labor market consequences, young people who enter today’s labor force 
without basic academic skills, the ability to think critically and creatively, and who 
are deficient in so-called “noncognitive” skills are at a competitive disadvantage in 
the global, information-age economy. 6 

High-quality teacher evaluation systems are seen as one lever for improving the 
teacher workforce and hence the outcomes of students, including high school stu- 
dents. The current degree of consensus around efforts to improve teacher evalua- 
tion is striking for the world of kindergarten-through- 12th-grade education. From 
traditionally conservative education observers and activists to teachers’ union 
leaders, from professional education organizations to philanthropic foundations, 
and from the U.S. Department of Education to local education agencies, a wide 
array of individuals, groups, and organizations are involved and often cooperating 
in efforts to support the design, testing, and implementation of the next genera- 
tion of teacher evaluation systems. 7 

On the public side the U.S. Department of Education made teacher evalua- 
tion an integral part of the Obama administration’s $4.3 billion Race to the Top 
competitive grant initiative designed to encourage and reward states that are 
creating the conditions for education innovation and reform. Meanwhile, in the 
nonprofit arena some of the nation’s most prominent foundations are reallocating 
grant money toward teacher evaluation initiatives, one example of which is the 
Bill & Melinda Gates Foundation investment of $290 million in four “intensive 
partnership” sites to support teacher effectiveness initiatives that include teacher 
evaluation and another $45 million in the Measures of Effective Teaching, or 
MET, project, a two-year effort to develop methods and tools for identifying and 
developing good teaching. 

As the various teacher evaluation initiatives move forward over the coming years, 
how they play out will likely be shaped by a simple but important contextual 
reality. While there is wide agreement about the need for new and better ways to 
evaluate teachers, different stakeholders place different emphases on what they 
ultimately want from evaluation systems. Some see teacher evaluation as a way 
to identify and remove low-performing teachers. Others view teacher evaluation 
as the cornerstone of new performance-based teacher compensation systems. 

Still others think that the emphasis should be on evaluation as a mechanism for 
improving teaching practice, a way to help teachers get better. At the end of the 
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day, however, the extent to which any of this can happen rests on evaluation that can 
consistently determine who are the more and less effective teachers in our classrooms. 

Information to accomplish this comes from two sources. First, we can use teacher- 
related inputs to the education process, such as classroom teaching observations or 
classroom artifacts such as lesson plans and teacher-designed student assessments. 
From these practice-based measures, we make inferences about a teacher’s ability 
to promote student learning. Second, we can measure outputs from the teaching- 
learning process — actual student performance — and, based on these measures, 
make inferences about the teacher’s contribution to that output. In each case 
doing this well for high school teachers is a challenge. 

In terms of practice-based measures of effectiveness, the many content areas that 
are covered in a typical comprehensive high school make it impossible for all 
teachers to be observed and evaluated by individuals who have training in the 
teacher’s content area. This not only potentially compromises the validity and reli- 
ability of the evaluation results; it decreases the likelihood that teachers will buy 
into and support the evaluation system. 

Likewise, the difficulties in using student performance data to evaluate high 
school teachers begins with the fact that these teachers rarely teach in grades or 
subjects where students have had comparable pre- and post-tests that can be used 
to construct prototypical value-added measures for the teacher. Another issue 
in using value-added measures at the high school level is that, unlike the case for 
elementary students, we have to worry about the fact that students in, say, an 1 1th- 
grade English course took different paths to get to that course. 

If these different paths affect their outcomes, then value-added models that do 
not account for this “path dependence” may not accurately estimate the teacher of 
record’s contribution to student learning. A similar problem is present if teach- 
ers affect learning across courses in a given year. Failure to account for this kind 
of “cross-fertilization” would again call into question value-added measures of 
teacher effectiveness. 

Thus, there are clear challenges to effectively evaluating high school teachers. 
Nevertheless, states and school districts across the nation are confronting these 
challenges and in the process solutions are emerging. A preview of the potential 
solutions that the analysis in this paper suggests may be employed in building 
optimum evaluation systems for high school teachers includes: 
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• Developing new and enhancing existing assessments that test high school teach- 
ers’ content-based pedagogical knowledge 

• Exploring, developing, and testing the increased use of technology such as 
classroom video recording as a means for generating efficiency and productivity 
gains in practice-based evaluation 

• Conducting more research on the properties and use of Student Learning 
Objectives, or SLOs, as a measure of effective teaching based on student 
performance 

• Continuing investigations into how value-added measures can be effectively 
used at the high school level 

• Finding the best ways to incorporate all available information from both 
practice-based measures and student performance data into the ultimate 
evaluation of teachers 

This paper examines the challenges and potential solutions to evaluating high school 
teachers, looking first at practice-based evaluation and then turning to student perfor- 
mance as the basis for evaluation. In each case the stage is first set with a brief discus- 
sion of the overarching, across-grade issues that accompany each method. 

In reviewing the issues at hand, it is important to keep in mind that these two 
models of evaluation, practice-based and student-performance-based evaluation, 
make inferences based on different points in the education process — input versus 
output, and they rely on different kinds of data — qualitative and more subjective 
versus quantitative and objective. And they are at different stages of developmen- 
tal evolution — well-established for many years (though evolving) for practice- 
based evaluation versus rapid developments over the last 10 years in using student 
performance data for evaluation. Nevertheless, the early evidence is that most new 
evaluation systems will be characterized by some combination of both of these 
methods to evaluate teachers, including high school teachers. 
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Practice-based teacher evaluation 


Practice-based teacher evaluation refers to the use of information from various 
aspects of a teacher’s practice in evaluating teacher effectiveness. Practice-based 
information on teacher effectiveness can come from: 

• Classroom observations 

• Teacher-produced artifacts such as lesson plans, teacher-designed student 
assessments, and portfolios 

• Evidence of how a teacher works with colleagues and administrators 

• Communications with parents 

• Professional development activities 

• Other evidence of a teacher’s professional activities 

Of these elements, classroom observations of the teacher at work are arguably the 
most direct evidence of a teacher’s ability to affect student learning. This would 
suggest that evaluation that uses some or all of the above components should give 
classroom observations the most weight in computing a final summative score for 
a teacher. Unfortunately, there is little information regarding how districts cur- 
rently weight the different measures when they combine them into a summative 
evaluation score for the teacher . 8 

An important feature of practice-based evaluation to bear in mind is that it is 
based on input measures into the teaching-learning process rather than output 
measures directly associated with student learning and achievement. The argu- 
ment for input-based evaluation is that teachers are evaluated on practices that 
education experts believe to be related to student learning, an argument that has 
less resonance in today’s environment of education accountability than it might 
once have had. There are three practice-based evaluation protocols that have been 
linked to student achievement growth but the great majority of districts use evalu- 
ation protocols that have not been validated against student achievement gains . 9 
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Until very recently practice-based measures were the only available means in 
most districts for evaluating teachers, and while a small handful of districts are 
beginning to incorporate output measures into evaluation systems, using input 
measures of teachers’ practice is still the way that virtually all districts currently 
judge teacher effectiveness. 10 In most districts the responsibilities associated 
with carrying out practice-based evaluation lie with building principals, assistant 
principals, curriculum heads, and other school-level administrators. 11 A major part 
of their evaluation responsibilities is conducting the classroom observations of 
teachers, a time-intensive process. The instruments used to guide the evaluation 
process — the assessment forms, performance criteria, and evaluation proce- 
dures — are largely created by the districts, a process that leads, not surprisingly, 
to considerable variation across districts in the structure, use, and likely quality of 
the evaluation. 12 Some of the problems with principal-led evaluations, in particular 
the tendency to rate teachers higher on the evaluations than is likely warranted, 
so-called “leniency bias,” are detailed in a 1997 paper by Pamela D. Tucker, where 
principals informally identified 5 percent of their instructional staff as being 
“incompetent” but gave only 1 percent formal ratings this low. 13 

Leniency bias on the part of principals is not a problem unique to evaluations of 
teachers that come from principal-led evaluations. 14 “The Widget Effect,” a 2009 
study by The New Teacher Project of 12 districts that are considered to have some 
of the most well-developed evaluation systems in the nation, including systems 
that utilize trained evaluators for classroom observations and other activities, 
replicated Tucker’s results of a decade earlier. Among the 1 2 districts that utilized 
a binary rating system for teachers, 99 percent of the teachers received the high- 
est rating. Meanwhile, in those districts that utilized a four-point rating scale, 94 
percent of the teachers received one of the top two ratings, and less than 1 percent 
were rated at the lowest level. 15 Of course, the problem with this is that when virtu- 
ally all of the teachers receive the highest scores, an evaluation system foregoes the 
opportunity to meaningfully differentiate teacher effectiveness in ways that could 
provide valuable information to the district and to teachers in need of assistance. 

In addition to the failure to use existing evaluation to differentiate among the 
variation in teacher effectiveness within schools that research consistently veri- 
fies, another issue is that high-quality practice-based evaluation can be relatively 
costly. The Cincinnati school district, for example, which is widely recognized as 
having one of the best evaluation systems in the country, allocated between $1.8 
million and $2. 1 million per year for teacher evaluation between the 2004-05 and 
2009-10 school years. 16 This translates into about $7,500 per teacher for each of 
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the approximately 250 teachers who underwent a comprehensive evaluation each 
year. 17 Approximately 90 percent of this cost is attributable to the salaries of the 
12-15 master teachers who leave the classroom for three years to serve as trained 
evaluators in the Cincinnati system. 18 

While issues that potentially compromise or complicate the use of practice-based 
evaluation get manifested at the high school level, so too do the advantages of using 
practice-based evaluation. In particular, when done well, practice-based evaluation 
can provide information to teachers and administrators regarding what good teach- 
ers are doing well, and therefore could be shared with colleagues, and what strug- 
gling teachers are failing to do well and need assistance to improve. Also, 
advocates of practice-based evaluation suggest that because the rubrics against 
which a teacher’s practice will be judged are known and available in public docu- 
ments, discussion around these explicit teaching practices can foster dialogue in a 
school and a district around what good teaching looks like. In this case the evalua- 
tion system can change the discussion around teaching. It is also the case that teach- 
ers and teachers’ unions, including high school teachers, tend to be more supportive 
of practice-based evaluations relative to evaluations based on student test scores. 

Finally, there is emerging evidence from a Cincinnati-based study by Eric S. Taylor 
and John H. Tyler that suggests well-designed practice-based evaluation can help a 
teacher get better. 19 Linking 10 years of data from the Cincinnati practice-based sys- 
tem to student test score data in that district, Taylor and Tyler find that teachers who 
go through a comprehensive evaluation are more effective at promoting student 
achievement growth than they were in the years prior to being evaluated, controlling 
for the effects of experience on teacher effectiveness. The research shows that not 
only are teachers more effective in the year of the evaluation, but the same teachers 
are even more effective at promoting student learning in the years following evalua- 
tion. This research suggests that there are substantial human capital gains associated 
with going through a rigorous and high-quality evaluation process. 

What’s more, the size of the estimated effect is substantial. A student taught by a 
teacher after that teacher participates in the Cincinnati evaluation program will score 
about 1 0 percent higher in math than a similar student taught by the same teacher 
before that teacher was evaluated. If those two students began their respective years 
at the 50th percentile of math achievement, the student who was taught after the 
teacher went through evaluation would score at about the 55th percentile at the end 
of the year while the other student would remain at the 50th. The study is unable, 
however, to identify the mechanisms of the evaluation process that lead to the gains 
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in teacher quality, but a prime candidate is the feedback that teachers receive after 
each of the four classroom observations during the year. 

Most discussions around evaluation today focus on their use to identify low- and 
high-performing teachers so that the poor performers can be removed from 
the classroom or provided with extra help and, in some few instances, the high- 
performing teachers can be rewarded via a merit-based compensation system. The 
possibility raised by the Taylor and Tyler study that rigorous practice-based evalu- 
ation can help make teachers better, effectively serving as professional develop- 
ment, adds an important dimension to teacher evaluation reform. 20 


Practice-based evaluation in high school 
Challenges and solutions 

The challenge in designing high-quality, practice-based evaluation systems for high 
school teachers begins with the content specialization embodied in high school edu- 
cation. Unlike elementary school, students in high school move from class to class, 
from teacher to teacher, and each teacher presumably (though not always) possesses 
specialized content knowledge. This structure poses real issues for practice-based 
evaluation of teachers since in the ideal it requires evaluators who are also content 
specialists. Thus, the high school principals and assistant principals tasked with 
evaluation in most districts would ideally be well-versed in the content being taught 
by each teacher they are evaluating as well as in the best practices for teaching that 
content. Given the number of different courses and content areas taught in a typical 
comprehensive high school, this ideal is not realistic. 

In districts that supplement or supplant principal-led evaluations with master- 
teacher evaluators, as is done for example in Cincinnati and Washington, D.C., 
it is more likely that content expertise between the evaluator and the teacher 
being evaluated can be matched. Even in this situation, however, it is not feasi- 
ble to have an evaluator with content specialization for every subject covered in 
high school. A typical comprehensive high school, for example, can have up to 
50 different courses that are offered and taught, and when career and technical 
education courses are included, the number of different courses across content 
areas can be 100 or more. 21 
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Some of the concern regarding content specialization can be mitigated by the 
nature of the evaluation protocol that is used to evaluate and score teachers’ 
practice. The Danielson Framework for Teaching, for example, is designed around 
the concept that good teaching looks the same across grades and subjects and thus 
the rubrics in the framework are appropriate for all teachers, including high school 
teachers. The Classroom Assessment Scoring System- Secondary, or CLASS-S, 
tool, the newest protocol from the CLASS-family out of the University of Virginia, 
is designed for use across subjects in high school. Thus, theoretically, an evaluator 
lacking expertise in a given content area should be able to evaluate high school 
teachers across subjects using these tools . 22 

Using content-neutral evaluation tools does not, however, solve all of the prob- 
lems, the most important of which maybe teacher buy-in. Consider the case of 
a calculus teacher who might be evaluated by someone with the closest available 
expertise — perhaps a individual with a math background and math certification — 
but who has never taught calculus. Even if the evaluation rubric guides the evalua- 
tor on general instructional behaviors that should be seen and scored in a calculus 
classroom, one can imagine the teacher asking “How can I be fairly evaluated by 
someone who has never before taught calculus?” This situation suggests a less- 
than-ideal dynamic between the teacher and evaluator, a dynamic that in the best 
of situations may inhibit how the teacher values any constructive feedback from 
the evaluator and in the worst of cases may lead to a union-backed grievance if the 
evaluation comes in low and there are negative consequences for the teachers. 

There are, however, ways to at least partially address the overarching problem. 
When the principal is not the primary evaluator in the system, districts can 
attempt to spread their evaluation expertise across content areas when selecting 
and training evaluators, knowing there will still be some subjects where there is 
content mismatch between the evaluator and the teacher. When the system relies 
on building administrators to conduct the evaluation, the model could be adapted 
so that, for example, at least one classroom observation is carried out by some- 
one in the district with content expertise that matches that of the teacher under 
evaluation. This will increase the logistical burden of the evaluation system and 
potentially the cost, but the payoff in terms of teacher and union buy-in could be 
worth it in the long run. Even so, given the number of different content areas in 
a comprehensive high school, it is not realistic to think that districts will ever be 
able to provide content-specific evaluation for all teachers under evaluation. 
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One possible way to address this problem is for districts to use direct measures 
of content pedagogy — assessments given to teachers that would measure how 
well a teacher can teach their specific content. While a wide range of such assess- 
ments is not currently available, pioneering work by the Learning Mathematics 
for Teaching, or LMT, project at the University of Michigan has led to the devel- 
opment of such assessments for math in grades K-8. One element of the afore- 
mentioned Gates Foundation’s MET project is to develop content knowledge 
assessments for teaching math and English language arts. Similar work on science 
topics is being conducted by the Assessing Teacher Learning About Science 
Teaching collaborative project between Horizon Research, Inc., and the American 
Association of the Advancement of Science. 23 Both the LMT and MET projects 
are working to validate their assessments by establishing the links among content 
knowledge, observed instructional practice, and student achievement. 

Given the K-8 focus of the LMT work and the fact that the only high school 
teachers targeted by the MET project are Algebra I, ninth-grade English, and 
high school biology teachers, a collection of content knowledge assessments for 
use at the high school level is currently not available to districts. That being the 
case, developing content knowledge assessments for the many content areas and 
courses taught in the typical high school is a challenge to be met. If content-spe- 
cific pedagogical assessments could be developed, however, they could potentially 
be used in concert with content-neutral classroom observation tools to evaluate 
high school teachers. Such a blended approach could measure a teacher’s broad 
and general abilities, her content-specific abilities, and at the same time potentially 
increase teacher buy-in. 

The use of digital video technology to capture teaching episodes is another area 
of promise for addressing high school teacher evaluation issues. As a part of the 
MET project, a technological innovation being explored is the use of a digital video 
camera that is set up and operated by the teacher and that allows a simultaneous 
360-degree panoramic recording of everything happening in the classroom as well as 
fixed-position recording of everything that is shared on the classroom blackboard. 24 
In the MET project the videos are uploaded and coded by trained evaluators hired 
to work with the project. Project plans call for several uses for the resulting video 
library, including comparing results across different scoring rubrics and raters, and 
correlating the evaluator-based scores with student achievement gains. 

If video recording of teaching catches on in the field, the results could be far reaching in 
terms of addressing the high school content- specialization problem. Districts would be 
able to share secure versions of the digitized teaching episodes with content-specialist 
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evaluators who could be located anywhere. A district could then have its own core of 
evaluators who score video files in a central location or even in their homes. One could 
also imagine districts sharing the costs and efforts of evaluators. One district might 
not be able to keep, for example, an evaluator of dance instruction busy, but evaluat- 
ing dance instruction might be a full-time job for a trained evaluator employed across 
several districts. This concept naturally leads to the possibility of the development of 
markets for trained evaluators across content areas. 

There are other advantages to high-quality classroom teaching videos. First, there 
are likely substantial cost savings. The cameras used in the MET project currently 
cost about $4,500 each and the software license can cost an additional $150 per 
teacher evaluated. 25 Using Cincinnati as an example, if 12 full-time evaluators in 
the Cincinnati system were replaced with 1 2 video cameras and the 250 teach- 
ers were evaluated via a new video system, the first-year costs for the camera and 
licensing would be about $370 per evaluated teacher. There would, of course, be 
additional costs associated with paying evaluators to view and score the videos, 
along with other support such as technical support for the operation and some 
teacher training on how to set up and use the equipment. Given that Cincinnati 
currently spends $7,500 per evaluated teacher, there is a lot of per teacher money 
left for these additional costs after the initial $370 video investment. Also, the larg- 
est part of the $370 per evaluated teacher investment, the $4,500 purchase cost 
per camera, is not a cost that recurs every year. 

Another advantage to a video approach to classroom observations is that since the 
evaluator has the option to rerun any portion of the video, there is the opportu- 
nity to review parts of the episode as might be needed, leading to greater scoring 
accuracy. The opportunity for multiple evaluators to score the same teaching 
episode would also allow for increased scoring accuracy and greater objectivity 
relative to “one-shot” in-person classroom observations. 26 Moreover, self-reflective 
teachers would have the opportunity to watch their own teaching episodes and 
learn from them. Finally, over time, a district could build a library of exemplary 
teaching episodes across all content areas for use in professional development 
activities, particularly with new teachers. 

What is not yet clear is whether teachers who currently resist the validity of being 
evaluated by out-of-content evaluators will have a different attitude toward being 
evaluated on video by out-of-district evaluators who would potentially have little 
or no contextual information to accompany the videotaped teaching episode. 
Nevertheless, given the direct personnel costs and inevitable content mismatch 
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associated with the current practice-based methods of evaluating high school 
teachers, the increased use of technology in teacher evaluation holds substantial 
promise, particularly when it comes to addressing the challenge of effectively 
evaluating high school teachers across the curriculum . 27 

While evaluating high school teachers using performance-based measures poses 
challenges to school districts, the challenges are not insurmountable. The real 
question in the years ahead will not be whether we have the tools and methods 
for evaluating high school teachers in valid and reliable ways, but rather whether 
we exercise the will to evaluate as many teachers as possible to the best of our 
ability and then use that information to differentiate high school teachers in 
ways that allow for meaningful personnel and instructional decisions. 

Another form of teacher evaluation gaining acceptance is based on student performance 
measures. Yet, like with practice-based evaluations, there are pluses and minuses. 
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Using student performance 
measures to evaluate teachers 


Teacher evaluation using student performance measures is a rapidly emerging area 
that is gaining a foothold in states and districts across the nation. Several factors 
have come together to make this possible. The testing requirements of NCLB that 
have emerged in the last decade have provided the necessary measures of student 
performance; the rapid spread, increased quality and declining costs of comput- 
erization and electronic data warehousing have eased the burden on data storage 
and manipulation. In addition, advancements in statistical modeling techniques 
have provided the analytical tools necessary for using student test scores to evalu- 
ate teachers. All of these factors have come together to make it possible to evalu- 
ate a portion of the teachers in most districts based on their ability to promote 
student performance on standardized tests, so-called “value-added” measures of 
teacher effectiveness. 

These quasi-experimental statistical models yield estimates of the contribution 
of teachers to student achievement, controlling for nonschool sources of student 
achievement growth. Another way to characterize value-added measures is that 
they are the difference between the actual achievement of a teacher’s group of 
students and the predicted achievement of these students given their prior test 
scores, demographic characteristics, and other measures in the model. Simply put, 
the objective of a value-added growth model for teacher evaluation is to eliminate 
factors contributing to both student achievement levels and student growth over 
which a teacher who is being evaluated has no control. 

From the district standpoint the advantages of using value-added to measure 
student growth and evaluate teachers begins with the fact that this method is 
much cheaper than practice-based teacher evaluation. Given that districts already 
have the data in hand for calculating teacher value-added measures, the primary 
costs to develop value-added systems are likely to be related to technical assistance 
consultation that maybe needed to help with data system construction and model 
development. These costs should fall as value-added models and the data struc- 
tures required for them become more standardized and established over time. 
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A second advantage of value-added measures of teacher effectiveness is that they 
evaluate teachers on observed output — student learning. One could argue that 
the tests on which value-added measures are based do not adequately measure 
the types of student learning that we want to promote in schools. This, however, 
is a criticism of the tests currently in use rather than a criticism of the value-added 
methods that rely on these tests . 28 

It is also the case that value-added measures are more objective than performance- 
based measures that ultimately rely on human judgment and that these statisti- 
cal measures of teacher effectiveness generate differentiation among teachers by 
construction. Given that districts do not currently use performance-based teacher 
evaluation in ways that produce meaningful differentiation among teachers, the 
variation inherent in value-added scores is potentially a significant benefit . 29 

Not surprisingly, there are cautions that accompany the use of value-added as a 
method for evaluating teachers, beginning with the fact that value-added measures 
provide no information that could help a teacher become more effective. With 
value-added estimates, a teacher only knows where he or she ranks in the value- 
added distribution relative to other teachers. There is no additional information in 
a teacher’s value-added score that would inform the teacher as to why he or she is 
ranked low or high relative to others. 

There is also concern that if value-added-based evaluations are used for high- 
stakes decisions, then teachers will have the incentive to “teach to the test” or 
cheat. In this case “teaching to the test” does not refer to explicitly teaching 
material that a given test covers, something we might want teachers to do for well- 
constructed tests. Rather it refers to spending valuable class time on test-taking 
techniques or focusing on responses to specific, expected questions — it is teach- 
ing that does not promote real and lasting learning gains. When it comes to cheat- 
ing, there is evidence that when test results are tied to high-stakes decisions, some 
teachers will resort to cheating as a way of artificially inflating test scores . 30 

Another concern is that, at their best, value-added measures only capture a 
teacher’s contribution to his or her students’ learning as measured by standardized 
test scores. The concern here is that these tests only measure a subset of what we 
want students to learn and by focusing teacher evaluation on this sphere, we risk 
distorting teaching and learning that encompasses the breadth of what we want 
students to learn. One counterargument to this is that standardized tests cover 
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material that had been developed by relevant stakeholders that is deemed to be 
important and that these tests tend to be predictive of later life outcomes that we 
care about. 31 Another counterargument is that for effective teachers, preparing 
students to do well on these tests is a byproduct of their normal teaching across 
the breadth of the curriculum. 

Two related issues — measurement error and intertemporal stability — also pose 
some cause for concern when it comes to value-added estimates. Performance 
on any assessment will vary from one administration to the next for random 
reasons — a natural fluctuation known as measurement error. For value-added 
measures this means that any given teacher’s ranking in a distribution of teacher 
value-added estimates is an approximation that contains a degree of uncertainty. 
What this means practically is that identification of teachers in the upper or lower 
reaches of the value-added distribution can be done with reasonable confidence. 
The value-added models, however, tend to be less useful for differentiating teach- 
ers who are more proximate to each other in the distribution. 

Measurement error also comes into play if instead of considering the value-added 
measures of two different teachers in the same year, one were interested in the 
value-added measures of the same teacher in different years. The concern here 
is about the degree of intertemporal stability of the value-added measures: the 
extent to which value-added scores for any given teacher are stable from one year 
to the next. While teachers’ effectiveness may change somewhat from year to year, 
one would not expect radical year-to-year changes in true, underlying ability. Thus, 
if value-added scores are good estimates of a teacher’s ability, then they should be 
relatively stable across time. 32 Statistically speaking, this means we would expect a 
high year-to-year correlation in value-added scores. If one year’s value-added score 
perfectly predicted the next year’s score, then the year-to-year correlation would 
be one. And if there is no year-to-year relationship, then the correlation would be 
zero. Results to this point across numerous studies suggest that year-to-year value- 
added correlations are usually 0.2-0. 5 for elementary school teachers, suggesting 
that a teacher’s value-added score in one year is not necessarily a good prediction 
of his or her value-added score in the subsequent year — a troubling proposition to 
some. It is the case, however, that with more information this problem is substan- 
tially abated. Using data from two prior years, for example, instead of one, substan- 
tially improves the ability to predict future performance. 33 

Regarding the statistical properties of value-added estimates, it is worth reiterating 
two points made in a study by Dan Goldhaber. 34 The first is that there are errors in 
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measurement in any endeavor to capture complex human behavior. This includes 
using practice-based measures of teaching effectiveness, even though measure- 
ment error associated with, say, classroom observation scores is rarely considered 
or discussed. Second, value-added estimates are on par with performance mea- 
sures in other fields in terms of their predictive validity. This includes the most 
statistical of all professional sports, Major League Baseball, where statistical mea- 
sures of output are regularly used to reward players and construct employment 
contracts, even though the year-to-year correlation of many output measures are 
similar to what we see for value-added scores. 35 

The purpose of controlling for prior student test scores and other student- and 
class-related factors in value-added measures is to try to isolate the effect of the 
current teacher on student achievement gains. Jesse Rothstein has conducted 
research that has brought into question how well typical value-added models actu- 
ally accomplish this, bringing attention to potential biases in value-added esti- 
mates. These biases arise when students are sorted into classrooms on the basis of 
“dynamic” factors such as student home issues that are time varying, unobservable 
to the researcher, and are related to student outcomes. 36 

Similar to the just-discussed instability issues, a response to the concern lies in 
having additional years of data to correct for this type of potential bias. Using rich 
data from the San Diego school district, Cory Koedel and Julian Betts were able 
to replicate the biases revealed in Rothsteins work. Additionally, they show that 
a sufficiently complex value-added model that evaluates teachers over multiple 
years can reduce the sorting bias to statistical insignificance. 37 One implication of 
this line of work is that while one year of value-added data might be used to make 
low-stakes decisions such as deciding on professional development allocation, a 
district would want to rely on multiple years of data for making high-stakes deci- 
sions such as teacher tenure or job termination. 

A last concern with value-added measures of teacher effectiveness is that only 
15 percent to 35 percent of the teachers in any given district teach in grades and 
subjects where students have both a standardized end-of-year test and a suitable 
pretest. 38 Primarily because of the state standardized tests arising from the NCLB 
testing regime, these “tested” grades and subjects tend to be math and reading in 
grades 3-8. This is obviously a major challenge for using value-added to evaluate 
high school teachers but some of the top value-added researchers in the country 
are working on this problem as will be discussed shortly. 
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Evaluating high school teachers based on student performance 


Inherent challenges 

Basing a substantial portion of a teacher’s evaluation on student performance mea- 
sures was not only required to win Race to the Top funds; it is something that many 
state departments of education and school districts across the nation are exploring 
even without supporting federal funds. After all, teachers teach in order to impact 
student academic performance, and many argue that it makes sense to evaluate 
teachers on how well they accomplish that goal. While value-added measures are 
currently the most researched and talked about method for doing this, there are 
other methods under consideration that will be discussed in this section. First, let’s 
look at the special issues that high school teachers pose for value-added models. 

As a roadmap to this discussion, the following issues will be taken in turn: 

• The availability of suitable tests at the high school level 

• The path dependency of students in a given class 

• The potential spillover effects of teachers in different subjects 

• How to weight the different subjects a teacher might teach in an overall value- 
added score 

• The potential for perverse incentives unique to high school 

• The logistics of student attribution 

• Having small numbers of teachers in the value-added distribution 


Availability of suitable tests 

Unlike grades 3-8, there are not regular state tests across grades and core subjects 
mandated by NCLB at the secondary level. 39 As a result, most states have not 
developed state tests for all high school grades, posing a central challenge at the 
high school level for using standard value-added models that require both a pre- 
and post-test. 40 Additionally, unlike the elementary grades where we can think 
of math and reading as being “core subjects” in the elementary and even middle 
school grades, the notion of core subjects has much less traction in high school 
where students begin to branch out and explore different topics and content. 
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Some states and districts can partially address the outcome-test problem via 
“end-of-course ,” or EOC, exams or semester exams that have been developed to 
assess student mastery of course-specific knowledge. These exams are tied to state 
or district curricula in a specific course, are administered as students complete a 
course, and usually have high stakes for the student such as whether the course is 
passed for credit. Currently, there are 23 states that have EOCs for at least some 
courses, but even in these states, many high school teachers still teach in grades or 
subjects that are not covered by EOCs. 41 Georgia is an example of a state that has 
developed statewide EOCs that all students in given courses have to take and pass. 
There are, however, only eight content areas across the four core subject areas 
(math, ELA, science, and social studies) that have EOCs in Georgia. Thus, while 
students taking biology, for example, would have an EOC in Georgia, students 
taking physics would not. 

The one district in the nation that is working on developing a comprehensive 
arsenal of course-specific semester examinations is Hillsborough County Public 
Schools, Tampa, Florida. Hillsborough County Public Schools has developed 
over 500 semester examinations for courses ranging from 9th-grade English to 
sculpture to trigonometry to welding. Hillsborough also worked with the Florida 
Department of Education to make their semester examinations available to other 
districts across the state via the Florida End of Course Exam Clearinghouse. The 
scope of the Hillsborough effort, over 500 exams, illustrates the magnitude of the 
effort required to develop course specific examinations that could cover every 
high school teacher. 

Nevertheless, as districts and states develop semester or EOC exams and share 
them through state sponsored exam clearinghouses or other venues, an increasing 
number of high school teachers will at least be teaching in courses that will have 
an associated post-test for use in value-added calculations. 

In grappling with the high school test availability issue, researchers often concep- 
tualize the value-added model less in terms of measuring student growth from 
one year to the next, as is the general conceptual model at the elementary level, 
and more in terms of exceeding (or falling short of) predicted achievement , 43 This 
conceptualization helps clarify the expansion of value-added models to noncore- 
subject areas and grades, like high school, where growth may not be easily defined 
or measured. In this realm there may or may not be pre-tests available for use as 
direct predictors of the outcome and, as a result, researchers will often have to rely 
on what could be termed related predictors of the outcome. 44 
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Taking math as an example, as a result of NCLB mandates, students in grades 4-8 
will have both an outcome measure (for example, fourth-grade state math exam) 
and a direct predictor in math (the previous year s third-grade state math exam). 45 
Contrast this with the high school situation where even when there is an available 
EOC exam that can be used as an outcome measure (an EOC exam in geometry), 
there may seldom be a suitable prior test that could be used as a direct predictor. 
As a result, value-added models at the high school level will often have to rely on 
related predictors of the outcome measure. 

At this point, the value-added research community is in the early stages of develop- 
ing and testing models that attempt to address the lack of direct predictors (pre- 
tests) in high school. In this work, some models use scores from contemporaneous 
tests in other subjects as related predictors. 4 * 5 That is, if the outcome is an EOC for 
lOth-grade geometry, then scores from lOth-grade English and science exams, if 
available, might be used in the predictive model. Other models under consideration 
and testing include not only contemporaneous, other-subject tests but also an EOC 
or summative exam from the previous year in the same content area. Still other 
models being studied would incorporate the content area eighth-grade state exam 
as a secondary predictor. At this point researchers are engaged in studies to develop 
the best possible predictive value-added models for use in evaluating high school 
teachers. Success on this front may only start the decision process for districts as it is 
unlikely that there will be black and white answers as to what is the “right” value- 
added model a district should use to evaluate high school teachers. 

Since different models will embody different assumptions and rely on differ- 
ent variables to control for outside-of-class influences on student performance, 
districts will likely have to make policy decisions regarding which models they will 
use for high school evaluation. Nevertheless, the work on this front is impressive 
with great strides having been made with many districts and states currently in 
the process of considering how, not whether, to use value-added as one factor in 
evaluating their high school teachers. 


Path dependency 

In elementary school students generally follow the same path through the grades 
with little deviation in terms of the subjects they study and in what year they 
study them. Thus, elementary-level value-added estimates rest on a fairly solid 
assumption that, in general, students arrive at the outcome measure via the same 
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academic pathway. Students taking the fifth-grade state math exam, for example, 
get there bypassing first through fourth grades and taking the math curricula in 
each of these grades. Where there are deviations, say, for being held back a year or 
being in a gifted and talented program, these deviations can be accounted for in 
the model as long as this information is in the administrative data. 

On the other hand, a common path to a given course is rarely the case in high school 
given that students in the same cohort can take different courses each year as well as 
different sequencing of the same courses as they progress through high school. If the 
courses taken prior to or contemporaneous with some outcome measure have the 
potential to impact that measure, and if the impact is different depending on when 
the earlier courses were taken relative to the outcome of interest, then it is important 
to account for such path dependency in high school value-added models. Given 
that there are many possible “paths” to, say, the EOC exam in 1 lth-grade physics, 
accounting for path dependency is not a simple matter. 

One way to study this potential problem is to use existing data to study how 
sensitive value-added measures are to path dependency. Existing data in a given 
district, for example, could be used to determine the most common two, three, 
or four paths students take to, say, 1 lth-grade physics. Value-added estimates 
that then controlled and didn’t control for these paths could be compared. If 
it turns out that in most cases path dependency is not a strong predictor of the 
outcome, given other predictors in the model, then there is less concern with try- 
ing to account for path dependency. If this is not the case, then the task will he in 
developing tractable methods that control for the different paths students take in 
arriving at the outcome of interest. 


Teacher spillover effects 

Again unlike elementary school, where students tend to be taught all of the sub- 
jects by one teacher, high school students have up to five or six different teachers 
every semester. If one’s performance is influenced by all of their teachers, regard- 
less of the outcome measure, then value-added estimates for the teacher of record 
that did not account for the spillover effects of other teachers could be biased. If 
most of a given teacher’s students, for example, tended to have very high-perform- 
ing teachers in their other subjects and there were spillover effects that affected 
the students’ scores in the outcome subject, then failure to account for this would 
lead to upward bias in the value-added estimates of that teacher: He or she would 
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receive credit for the systematically good teachers that his or her students had in 
their other subjects. This too is a potential thorny issue because of all of the poten- 
tial combinations of “other teachers.” Again, a potential way to study this issue is 
to use existing data and model some of the most common combinations of other 
teachers observed in the data for given outcomes of interest and determine how 
important it is to account for students’ other teachers . 47 


Teacher weighting across different courses 

The issue here is that some, if not many, high school teachers will teach more than 
one course. The typical high school math teacher, for example, might teach an 
Algebra I course along with some Algebra II courses and some geometry courses 
in the same year and/ or semester. Or, given the breadth of courses offered in 
the typical comprehensive high school, a single teacher might be responsible for 
teaching, say, history, psychology, and sociology. In developing a measure of how 
effective a teacher is, how should the value-added-related performance in each of 
the various subjects be weighted? This is a policy decision rather than a modeling 
decision that districts will have to make, and there is no clear answer as to what is 
the “right” decision. Since the decision will most likely affect teachers’ final evalu- 
ation score, however, it is a serious issue in evaluation system design that districts 
will have to consider. 


Perverse incentives 

There are certain perverse incentives that can arise at any grade level when teacher 
evaluation is based on student test scores. High school teachers whose evalua- 
tions are dependent upon their students’ test scores are certainly not immune 
from the temptation to “teach to the test” or to cheat in ways that could artificially 
inflate test scores. In addition to these grade-neutral perverse incentives, however, 
teacher evaluation based on value-added models at the high school level must 
guard against perverse incentives specific to high school. In particular, we would 
want to guard against practices that incentivized teachers to encourage their 
lowest-performing students to either drop out of school or drop the course before 
the end-of-year exam. High school evaluation systems also need to assure that all 
students, including those repeating a course, which is most common in the ninth 
grade, are included in the testing regime so that teachers will have the incentive to 
teach all students to the best of their ability. 
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Logistics of attribution 


Assigning the correct students to the correct teachers is an important issue for 
all value-added models, and the evidence is that correctly capturing a teacher’s 
roster of students based on administrative data is a far-from-trivial task. In a 
study by the nonprofit, school-improvement organization Battelle for Kids that 
covered 57 districts and 730 schools in two states, 26 percent of all student 
assignments were moved, added, or deleted from the district-reported data 
through Battelle’s more carefully executed teacher-linking process. 48 Correct stu- 
dent attribution is an even more difficult process at the high school level than at 
the elementary level if for no other reason than each teacher has multiple classes 
of students, each of which require accurate attribution. Roster verification, espe- 
cially in high school, is a major challenge that districts contemplating the use of 
value-added measures have to take seriously. 

Given both the challenge and the importance of correctly linking students and 
teachers, several organizations across the nation are devoting resources and time to 
help states and districts develop data systems and infrastructure that will allow for 
the kind of linkage required by high-quality value-added systems. Among these orga- 
nizations are Battelle for Kids, the Data Quality Campaign, and the Teacher- Student 
Data Link Project of the Center for Educational Leadership and Technology. 49 


Teacher group size 

All value-added estimates place teachers in a distribution of other teachers. In the 
case of value-added at the elementary level, the teacher is usually compared to 
other teachers in the district in the same grade and year. Thus a fifth-grade teacher 
in a midsized school district might be in a distribution with 75 to 100 other fifth- 
grade teachers. A potential issue with high school value- added is that in addition 
to the grade by year comparison group, teachers need to be compared to other 
teachers teaching the same subject. This can be potentially problematic for mid- to 
small-size districts as there may not be enough, say, physics teachers in the district 
for the comparison to be meaningful. In that type of scenario, what does it mean to 
be at the bottom (or top) of the value-added distribution of physics teachers when 
there are four physics teachers total in the district? Value-added with small numbers 
of teachers in the comparison group probably makes little sense and in these cases 
other measures of teacher effectiveness will have to be used or the district in con- 
junction with the state department of education would need to determine whether it 
makes sense to compare teachers statewide rather than districtwide. 
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Potential opportunities 


There are also potential opportunities for using value-added measures in high 
school teacher evaluation not present in the elementary school setting. Since 
the end-of-year state exams used at the elementary level are taken by all stu- 
dents in a state, they are largely designed to be “curriculum free.” In contrast, 
the EOC tests and other course-related tests being considered for high school 
value-added are designed to be in alignment with the curriculum being taught. 
As a result, teachers have more power to impact student performance on tests 
aligned with the taught curriculum than they do on performance on curricu- 
lum-free state exams. It is also likely that incentives are better aligned, which 
in turn yields greater teacher buy-in when the test over which a teacher will be 
held accountable is sensitive to the taught curriculum. 

A second potential advantage for high school value-added is that in most cases 
there will be substantially more students on a high school teacher’s roster than 
an elementary teacher’s since high school teachers as a rule teach several classes. 
These additional students help reduce the measurement error discussed earlier, 
meaning that a teacher’s value-added score is more precisely estimated. 

A final opportunity worth noting is that the series of challenges a district faces 
in trying to develop good value-added measures for their high school teachers is 
leading districts to contemplate the entire evaluation, testing, and data enter- 
prise of the district in a more coherent way than has generally been the case 
to this point. Districts on the leading edge of this work understand that using 
value-added to evaluate high school teachers requires thinking about testing 
and data issues in a coherent and systematic way. When done well this holis- 
tic approach can have positive spillover effects that can benefit more than just 
teacher evaluation in the district. 


Other student performance measures 

As previously noted, many states and districts are developing student perfor- 
mance measures that are not based on value-added estimates. In large part these 
efforts are in response to some of the difficulties discussed above in constructing 
value-added measures for teachers in nontested grades and subjects. One result of 
this work is the development of Student Learning Objectives to measure teacher 
effectiveness. SLOs are data-based targets of student growth that teachers set for 


23 Center for American Progress | Designing High Quality Evaluation Systems for High School Teachers 


the students at the start of the semester or school year. Teachers are then evaluated 
at the end of the semester or school year on the extent to which their students met 
the objectives that they set for their students. SLOs are currently in use in Denver, 
Colorado; Austin, Texas; and Charlotte-Mecklenburg, North Carolina. Many 
new systems, including the state system in Rhode Island, are incorporating SLOs 
into new evaluation systems. so In some instances SLOs are used only in nontested 
grades and subjects, while in other cases a district is incorporating SLOs into the 
evaluation of teachers in addition to, instead of in place of, value-added measures. 

The typical steps a teacher goes through in setting SLOs are: 51 

• At the beginning of the semester or year, review available data on the students 
in the class, including prior-year test performance and any course pre-tests that 
have been administered. 

• Based on the data, set a designated number of objectives, usually two — class- 
roomwide and student- or subgroup-specific. An example of a class-level objec- 
tive might be: “Increase the Algebra I end-of-course pass rate by 5 percentage 
points over last year’s 85 percent pass rate.” 

• Identify appropriate measures against which objective attainment will be 
judged. Following the above example, the measure for judging objective attain- 
ment would be the class pass rate on the Algebra I end-of-course exam. 

In many instances teachers must review and discuss their SLOs with their prin- 
cipal (and sometimes central office staff) as a part of the SLO-setting process. 
The principal (and central office staff when appropriate) must then approve and 
sign off on each teacher’s SLOs. While there is little research to date on SLOs, 
principal and/ or central office approval is likely a key part of the SLO process. 
The incentive with no oversight is for teachers to set easily achievable SLOs that 
their students can readily meet. 

Most of what we know about SLOs at this point comes from Denver and Austin. 

A 2004 pay-for-performance pilot program based on 17 Denver schools, includ- 
ing two middle schools and two high schools, found that 89 percent to 93 percent 
of the teachers in the pilot met their objectives over the four years of the pilot 
(1999-2003). Further, the quality of the SLOs set by teachers went up over the 
four years. 52 In studying the relationship between the quality of the SLOs set by 
middle school and high school teachers and their students’ performance, there 
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was consistent evidence of a positive relationship, though this evidence was 
often not statistically significant. Positive and statistically significant correlations 
between the number of objectives met by teachers and the achievement of their 
students were found in the two high schools. 53 

In Austin, SLOs are part of the REACH strategic compensation program. 54 There, 
SLOs are based on the state curriculum — the Texas Essential Knowledge and 
Skills. In 2009-10 teachers in REACH schools received stipends of between 
$1,000 and $1,500 for each met SLO. The SLO process in Austin begins with 
teachers examining their students’ performance data and identifying two areas 
of greatest needs. Pre-assessments are then administered to the students in the 
selected area of needs. Based on the results of the pre-tests, teachers then set SLOs 
that must be reviewed and approved by the campus principal and district central 
office. Each SLO must indicate performance targets that students will meet by 
the end of the school year and how performance will be assessed. Student results 
from the end-of-year post-assessment are then used to determine whether or not 
a teacher met his or her SLOs. In an analysis based on 2009-10 state test data, 
students of REACH high school math and science teachers who met at least one 
of their SLOs demonstrated greater net achievement growth than did the students 
of REACH teachers who did not meet SLOs. 55 

The nonexperimental nature of both the Denver and the Austin studies relating 
SLOs to student achievement leave some doubt regarding that relationship and 
indicate the need for additional studies on the topic. Future studies should also 
look at the relationship between SLOs and student achievement gains, not just 
levels. Another issue with which SLO-based evaluation will need to grapple in the 
future is comparability across classrooms when SLOs are based on the individual 
choices of teachers. Finally, it is not clear at this point how effective the SLO 
evaluation process can be at differentiating among teachers who are differentially 
effective. The early evidence from the Denver pilot project looks disturbingly simi- 
lar to the results from the “The Widget Effect.” 

Another way in which several districts and at least one state (Delaware) are 
developing nonvalue-added measures of student achievement growth is through 
“common growth measures.” To develop a set of common growth measures, a 
committee of teachers and other educators at either the state or district level 
reviews available measures of student growth and then makes recommendations 
to the district or state, which then approves a list of measures for each subject 
and grade under consideration. In Delaware more than 300 educators have 
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been enlisted to help develop student growth measures in 30 content areas. 56 In 
addition to Delaware five upstate school districts in New York are developing 
common growth measures to use in evaluation systems. This is a promising and 
much-needed development in that the common growth measure process is even 
less studied than SLOs. At this juncture it is too early to know the extent to which 
either can serve as useful tools in teacher evaluation systems. 
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Conclusion 


Developing effective ways to evaluate high school teachers holds the promise 
of improving the high school education experience for the millions of students 
who will be taught by these teachers. Based on what we know from research and 
experience, the following ideas should guide evaluation system development and 
implementation and the ongoing work in this area in the coming years. 


The best evaluation systems will incorporate all available 
information from both practice-based information and student 
performance data into the ultimate evaluation of teachers 

This means that both value-added measures and SLOs should be used when pos- 
sible and that value-added measures should be used with as many high school teach- 
ers as possible. The tough decisions should be around how to weight the different 
components, including value-added, not whether or not to use them at all in teacher 
evaluation. Based on the research evidence, systems that do not use all of these mea- 
sures are leaving information about the effectiveness of their teachers on the table. 

At the same time, just because it may be hard to develop value-added measures 
for all high school teachers in a district, districts should not use this as an excuse 
to forego value-added evaluation for the subset of teachers where the data are 
available. Just as the standard pay scale across all subject areas that is common in 
education is shortsighted and reduces the supply of qualified math and science 
teachers who could be recruited into schools, refusing to use value-added informa- 
tion with some teachers because it may not be available for all is equally unwise. 


A good evaluation system will not shortchange its practice-based 
evaluation component 

There are very few systems that currently make the investments necessary to 
conduct high-quality practice-based evaluation. Well-designed systems use dedi- 
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cated and trained evaluators, master teachers who leave the classroom for two to 
three years and whose sole responsibility is evaluation of the district’s teachers. 
Teachers are observed in the classroom multiple times during the year, the class- 
room observations are of sufficient length to gather meaningful data about that 
teaching episode, and the visits are mostly unannounced. Well-designed systems 
use research-based evaluation protocols and the training of evaluators on these 
protocols is taken seriously. While such systems can be costly, districts can offset 
some of the costs by only putting a portion of the teachers on the full, comprehen- 
sive evaluation cycle each year. It is arguably better practice to evaluate a portion 
of the teachers at a level of high quality each year than it is to evaluate all of the 
teachers with low-level evaluation every year. Furthermore, research suggests that 
high-quality evaluation can pay for itself by increasing teacher productivity, and 
that given this districts could further offset evaluation costs by redirecting some 
professional development dollars to evaluation . 57 

It is likely that many districts that rely on principals for their teacher evaluation 
activities do so because they see the cost savings of this approach relative to hav- 
ing a cadre of master teachers who serve as evaluators. It is also likely that few of 
these districts take into account the opportunity costs of shouldering principals 
with this extra duty when considering the costs and benefits of principal-led evalu- 
ation relative to alternative approaches to evaluation. Given the rising importance 
of high-quality teacher evaluation going forward, districts should henceforth 
consider the full costs and benefits, not just the accounting costs, of using princi- 
pals to carry the bulk of the evaluation load. When considering the full costs and 
benefits of having principals versus full-time, trained evaluators, or some combina- 
tion of principals and evaluators, carry out evaluation, principal-led evaluations 
may not be as attractive as districts currently perceive them to be. 


We need to continue the work that is being done to develop and test 
value-added models appropriate for use with high school teachers 

An important test for resulting models will be the extent to which high school 
teachers’ value-added scores are correlated with their classroom observation 
scores. This linkage has been established in some few but important instances 
at the elementary and middle school levels . 58 The takeaway from those studies 
is that low- and high-value-added teachers are doing something different in the 
classroom, and the evaluators observing these teachers are seeing that difference 
and scoring it, even though the evaluators themselves have no knowledge of where 
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a teacher might be in the value-added distribution. Finding this same kind of 
relationship between value-added and classroom observation scores at the high 
school level would provide an additional level of confidence in what are likely to 
be relatively complex high school value-added models . 59 


More research focused on SLOs is needed 

Given the increasing use of this evaluation method; we need more information 
on the extent to which SLO scores are related to a teacher’s ability to promote 
student learning gains. Absent this evidence we have to trust that SLOs are mea- 
suring effective teaching. We need research that can determine whether the SLO- 
setting process makes teachers better and raises the general level of teaching on a 
campus and in a district as some advocates claim. Given that SLOs may be used 
more at the high school level than at the elementary level, this research evidence is 
particularly important for high school evaluation. 


More work needs to be done on the possibility of developing 
assessments that can gauge a teacher's ability to teach his or her 
content 


We first need to know more about the extent to which these content-specific peda- 
gogical assessments are related to a teacher’s ability to promote student learning in 
the content. If we find there is promise on that critical dimension, then that would 
be cause for a push to develop more of these assessments across additional high 
school content areas. 


Teacher evaluation should take the lead in finding a way to use 
technology for efficiency and productivity gains 

Education has a dismal record in effectively using technology. If teacher evaluation 
is to have the impact envisioned by many, we may not be able to afford to follow 
this pattern. Especially in the realm of practice-based evaluation, technology holds 
the promise of allowing us to do more, do it better, and with less cost, particularly 
at the high school level. The promise of digital video technology is that all teachers 
in a high school could be “observed” several times during the year via video, these 
teaching episodes could be viewed by content specialists, and the total costs will 
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be less than the costs of training, providing administrative support, transportation, and 
salary for district-based, full-time evaluators. In addition, as the district builds a video 
collection of exemplary teaching sessions, it will develop an in-house bank of examples 
that novice teachers and others can view as professional development exercises. 


*** 


The intense work being done across the nation is rooted in the belief that if we can 
do a better job at teacher evaluation, the ultimate result will be better outcomes for 
our kids and by extension our nation. A large part of answering that promise rests 
with how well new evaluation systems perform at the high school level. An impres- 
sive array of high-performing teachers and administrators, evaluation experts, 
economists and statisticians, technology leaders, and state and federal policymakers 
are currently working on teacher evaluation issues, with much of this work focusing 
on the high school question. The urgency with which this work is being done and 
the timing of much of the grant money tied to new evaluation initiatives suggests 
that in the next few years, new evaluation systems will be in place. It is fair to say that 
just as putting teacher evaluation front burner in a district can change the way teach- 
ers talk about their craft, the national effort currently in place is changing the way we 
talk about teaching and how we can and should evaluate teachers across the nation, 
including those in our high school classrooms. 
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